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(57) ABSTRACT 

The present invention provides a hypermedia database for 
managing bookmarks, which allows a user to organize 
hypertext documents for querying, navigating, sharing and 
viewing. In addition, the hypermedia database also provides 
access control to the information in the database. The 
hypermedia database of the present invention parses meta- 
data from bookmarked documents and indexes and classifies 
the documents. The present invention supports advanced 
query and navigation of a collection of bookmarks, espe- 
cially providing various personalized bookmark services. In 
one embodiment, the present invention utilizes a proxy 
server to observe a user's access patterns to provide useful 
personalized services, such as automated URL 
bookmarking, document refresh, and bookmark. expiration . 
Lin addition, a useTmay alscTspecify„various preferencerin 
rjookm ark m anagement, e.g., ranking^chemes-(i.e.~by 
- - referral; access 'frequency,^»r popi^^y) and navigation tree 
--^fan-out:— A-subscriptibn service which retrieves new or 
updated documents of user-specified interests is also pro- 
vided. 

44 Claims, 20 Drawing Sheets 
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Hetsccpe Potcr Boofcmarfcs: Internet Seorch |q|C 



Help 



[QPBTyR^ SEARC H I BOOKMARKS I 1sOK£RIPTIONS I rCRAWOHGl 
Search Result for web crawl 



UNIT: Mtp://eJtpo9ecoin/poge/{3) 
Eye Growl! 

ye Crawl! This is the Eyecrawl. At least that's what we col him. He lost both his eyes is o 
bloody battle with o crococSle when he come to earth millions of years ogo. No one knows 
what he is redly colled. Nore where he come from. All we know is... 
99X-2124 bvtes-1998/08/Q8-http: //emo^ Woay /meanest 

The Turtle Hal of Records 

he Turtle Hall of Records he Top Turtle Records for the Top Turtles! Hey dl you sports fans 
and non-sports fans! Are you ready to see some AMAZING THINGS?? For those of you who 
think turtles ore slow, youre TOTALLY wrong (tortoises are the slow... 
96X-3284 bytes-1998/08/a -htto: //inrw.exDooe.com/D^/tiirtlgr^^ 
JegHy 

eg Fly This is the Jeg Fly. He is not nearly os bod as Crowl eye, but he is mean. He wos 
hated by every one. Crawl eye came ond promised him a respected place. He thinks Crawl 
eye respects him. Really he's just useing him. Somehow I feel its our... 
93X-1254 bytes-1998/07/102 -htta //exDQy.com/Doqe/2nd 

UNIT: http://wwwJunacom/lUMA-10/ftp/(3) 
Stogger_Stagger_Crawl 

Stagger Stagger Crawl on IUMA— Straight ohead rock and roll attitude with a modem and 
alternative edge that college students in particular would eat up for breakfast, lunch, ond 
dinner. 

97X-2329 bytes-1998/02/17-htto:^^ Stooger Crawl/ 

Index of /ll»IA-Z0/ftp/volume2/Goddard/ 

Index of /IUMA-2.0/ftp/volume2/Goddard/ Nome Last modified Size Desertion Parent 
Directory CHVJETR.mp2 20-Feb-95 18:15 8U Caliing.mp2 20-Feb-95 18: 28 4M Crowl.mp2 
20-Feb-95 18:34 3M 0or*s.mp2 20-Feb-95 18:45 5M Down.mp2 20-Feb-95 19:00 6M 
Feather. 

95X-4544 bytes-1998/04/28-htfa/^^ 
Index of /IUMA-ZO/ftp/music/Goddard/ 

Index of /IUMA-20/ftp/music/Coddard/ Name Last modified Size Descrption Parent 
Oirectory CHVJETRmp2 20-Feb-95 18:15 8M Callhg.mp2 20-Feb-95 18:28 4M Crowl.mp2 
20-Feb-95 18:34 3M Dorks.mp2 20-Feb-95 18:45 SM Down.mp2 20-Feb-95 19:00 6M 
Feather... 

92X-2804 bytes-1998/08/12-htto:/^^^ 
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Fie Edt View Co Communieotof Help 



OPEN URL | IINTERNET SEARCH! [BOOKMARKS 1 1 SUBSCRIPTIONS 1 1 CRAWLtNfl 



Samples of documents to be crawled 



Skin CmtScring Chart 

Sang Charts Chokas Sze Heck (In Indies) Smd 9-13 Uedum 12-16 Lorge 15-19 S/IJ 11-15 MA 13-17 ttist 
Bands Size Hrrstfln Inches) S/ll 7.S U/L 9 Belts Sze list (In Inches) Smd) 26-30 Uedum 30-34 targe 34-38 
K-Lorge 38-42 [HoraelOrdering... 
99X-2124 bytes -mM/^fal/wnftou^am/^tm 
SoggvJStogger Ctari 

Stagger Steqger Crori on IUUA — Straight ahead rock and rd attitude with a modern and dtemotive edge (hat 
college students would eat up for brertfast, lunch end dinner 
99X-5799 bytes - 1998/12/22 - Kteffmjmam 
freepy &awl Booking 

Booking information to set up o show at the Creepy Crawl, cdl Shannon Hil at (JI4)-621-9091 between 6 and 8 
PM, Monday through Thursday, August Shots September Shows Directions-how to get there Booking Information Units 
Webmaster James Lambert 

99&-1586 bytes - 1998/08/11 - htta- //ww.cr^oiko™ Axrtng html 

GREAT AUSSIE PUB CRAM. 

North American Reunion Ride 1999 

99X-3635 bytes - 1998/10/26 - MtoJ/mlbc^^Mti^ nrWintohtml 

Jra)cejs twmepagtPloce wan Bfe crawl at speed HOI mph 

Jra|cejs hcmpogePloce *ere fife trails at speed 1X01 mph 
99X-1288 bytes - 1998/05/21 - htte/Wc^AeinW 

cRowT'iiHdnePoge 
♦cRawi* 

99X-4380 bytes 1998/10/09 - htto: / WtdkdtvWun^^^ 
Sphere Version 2.0 - Title Crawl 

Sphere Version 2 Graphics Package Keyer Composite Modes Load Looping 16X9 Support Title Crowfs Linear 

Keyframes Load Multiple Graphics Scrollrig Time Snap Export QuickTime Movie Quick Composite Title Crawl 

Now you can have title crawls of up to... 

99X-1Q2D4 bytes- 1998/09/05 - htto: 77www.sttodv.com/Crowl.html 
The UoffotW Ccterptor Crawl 

The Offcid Moffott's Home Poge - Home of the Up And Comhg Brother Quartet! 

99X-5928 bytes- 1998/12/15 - htto://www,thcmofMsWscno crawLMml 
Construct | Crad and Burrow 

Construct | external sevch engine 

99X-5928 bytes- 1997/04/11- httD:7/wo.constructjH>t/ seg r C h.hhn1 

ASC b NY Cr<wi Jan 11,1997 
Schedule far Jonllth aawl 

99X-2187 bytes- 1998/12/22- http://www.ci9argroup.com/crawl5/ny/ 
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Qick on the desired Newsgroups below (first column) to list newsgroups relevant to that 
Library of Congress Classification (LCC) Category, aid on the desired LCC ID below 
(second column) to jump to that port of the LCC. 


A 




Newsgroups 


LCC 10 
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News 
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News 
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News 
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News 
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1040-1060.2 Cycling, Bicycling, Motorcycling 


News 


CV940J-857 


840.7-857 Winter Sports: Ice Hockey, 9ciing, Bobsledding, etc. 


News 


SP305-3Q7 


305-307 Driving 
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FIG. 13 

Computas and Intemet/Inttrnet/lfcrld Wide Web/botabases and Setrdung/web Directories 
Computers and Intemet/Sortore/Dotti>osss^lleb Directories 

Computer end Internet/Sothrore/fceviefs/rite^ Authorizing Tods/fortabase Tools 

Computers and IaternBt/Intemet/Worid Wide Web/Dotobases and Searching 
Reojand /Countr^/Canodo/Cornputers and Weroet/Intemet^ferid Wide Web/ Databases and Severing 
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SYSTEM FOR PERSONALIZING, 
ORGANIZING AND MANAGING WEB 
INFORMATION 

CROSS REFERENCE TO RELATED 
APPLICATIONS 

The present application is related copending U.S. patent 
application ("the '759 Patent Application"), entitled 
"Advanced Web Bookmark Database System," Ser. No. 
09/184,759, filed on Nov. 2, 1998, pending and assigned to 
NEC USA, Inc., which is also the Assignee of the present 
invention. The disclosure of the '759 Patent Application is 
hereby incorporated by reference in its entirety. 

The present Application is also related to U.S. patent 
application (the "Navigation Trees Patent Application"), 
entitled "Personalized Navigation Trees," Ser. No. 09/274, 
814, U.S. Pat. No. 6,393,427 filed on the same day as the 
present Application, and assigned to the Assignee of the 
present invention. The disclosure of the Navigation Trees 
Patent Application is hereby incorporated by reference in its 
entirety. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to knowledge retrieval, 
management and processing on the world wide web and 
intranets. In particular, the present invention relates to 
personalizing, organizing and managing information on the 
world wide web and intranets. 

2. Discussion of the Related Art 

Users of the world wide web ("web") suffer information 
overload. The web has no aggregate structure for organizing 
information into distinct web localities nor does a user have 
a global view of the entire Web from which to effectively 
retrieve relevant pages. In fact, a recent survey of 11,7 00 
web users indicates that 30.31% of the surveyed users report 
encountering problems in "finding known information." In 
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20 



25 



35 



According to one aspect of the invention, the bookmark 
system includes a document classification system for asso- 
ciating documents of the bookmark system into one or more 
categories. The classification system may access a classifier 
program on the computer network through the interface. The 
bookmark system accesses the computer network through a 
proxy server. In one embodiment, the database system 
accesses a lexical dictionary for retrieving a list of keywords 
that relate to a document. The proxy server can be used to 
monitor an access pattern for a document and the record 
identity of the user accessing the document. 

According to another aspect of the present invention, the 
bookmark system classifies a document into one of many 
categories, each category being a leaf nodes of a hierarchical 
classification or navigation tree. In one embodiment, each 
category preferably include less than a predetermined num- 
ber of documents. When the number of documents in an 
existing node exceeds the predetermined number of 
documents, the existing node is split into child nodes. 
Conversely, the child nodes of a parent node in the naviga- 
tion tree are merged with the documents in the child nodes 
sum to less than the predetermined number. 

According to another aspect of the present invention, the 
bookmark management system associates one or more user- 
specific records to each document record with a user-specific 
record, and one or more owner-specific records to each 
document record. The owner-specific records allow the 
owner of each bookmark to specify whether or not the 
bookmark is to be shared, thereby implementing access 
control. More than one owner-specific or user-specific 
record can be associated with a single document record. The 
bookmark management system needs only store one book- 
mark per document. In addition, the bookmark management 
system can present to a user a customized view of the 
bookmark. 

f^In'accoTdance with another aspect oflhe~invention, tti? 
bookmark system automatically creates a bookmark for as 
user or for the system when a document is accessed at a high' 



the same survey, 27.80% and 12.16% of the surveyed users 4 A enough frequei^y^veira. pe^iodof timeTIn one emLCTiment^ 



report, as significant problems, organizing collected infor- 
mation and finding pages already visited, respectively. 

Another study focused on bookmark usage indicates that 
most users gradually build a small sized archive. 68% of the 
surveyed users have 11 to 100 bookmarks and over 93% of 45 
the surveyed users create 0 to 5 bookmarks in each browsing 
session. The study also found that a larger archive requires 
a more sophisticated organization, such as automatically 
classifying bookmarks according to the contents of the 
documents they mark. An empirical study on users' patterns 50 
of revisiting web pages found that 58% of the web pages a 
typical individual accesses are revisits. 

^ese^sjudies,suggest-a-need-for-a-tool-that allows a useft 
|to"builcl and organize a large collection of book marksj hanj 
he or she can reasona bly manually maint ain now . 55 

SUMMARY OF THE INVENTION 

The present invention provides a bookmark system hav- 
ing access to a computer network. Such a bookmark system 
includes (a) an interface to the computer network; (b) a 60 
database management system; and (c) a bookmark manage- 
ment system coupled to the database and the interface. In the 
bookmark system, the bookmark management system cre- 
ates and maintains in the database document records 
("bookmarks") containing information for locating docu- 65 
ment in the computer network, and retrieves documents, 
when needed, from the computer network over the interface. 



the'"comiectedness" of a document (i.e., the number of links 
into the document and referred by the document) provides a 
measure to assist in selecting bookmarks to include auto- 
matically. The "popularity" of a document, i.e., the percent- 
age of users accessing a document, is also used to assist 
selection and ranking. 

Alternatively, the bookmark system allows collection of 
documents by "crawling". In one embodiment, parameters 
specified for crawling include the number of levels of links 
followed from a document. The bookmark system can 
calculate an estimated time based on the number of links. In 
addition, the bookmark system retrieves and presents to the 
user sample documents for user consideration prior to com- 
pleting the crawling request. The bookmark system allows a 
crawling request to be limited to the number of levels of 
links to traverse from a seed document. Also, the crawling 
request can be limited to within a specified domain. 

According to another aspect of the present invention, the 
bookmark system provides an efficient database manage- 
ment system that includes folders, in addition to document 
records. In that database system, records are related to each 
other by pointers, so as to facilitate database operations. The 
operations of the bookmark management system are 
achieved by traversal of pointers to document records and 
folders. ^r^example,-when-a-page-has-an'ac^ss'pa^ttern 
satisfying.renairFpr^ the booKmark.man^ 

agemejitjiystem^carFffl 
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(special purpose folder by simpl>^associatingthe folderwith^ FIG. 19 shows preference setup window 1800 for a user 

a pointerTSuchfolders canjnclude^ietio^iolders, hot link to define personal preferences, 
(folder^etc.^Subscription-folders can-also-be-set uprffri ucju^ 

tjjeriodiclil^ DETAILED DESCRIPTION OF THE 

u pdated jnformation forselected-bookmarks. The subscrib- 5 PREFERRED EMBODIMENTS 

ing users are notified when new or updated information is ^ . . c 4 , . . . . . . . . 

available e em b°diment °f tne present invention is provided in 

__ " . . . , , , a system that includes a web database ("WebDB") which is 

The present invention is better understood upon consid- described in the .759 Patent Application. The present inven- 

eration of the detailed description below and the accompa- ^ tion is 5ased on ^ concept of « augmented hypermedia"- 

nying a wings. • e ^ a system w hich extracts useful meta-data (from accessed 

BRIEF DESCRIPTION OF THE DRAWINGS URLs ) and observes user behavior to provide valuable 

personalized services. Unlike prior art bookmarking 

FIG. 1 shows one embodiment of the present invention in schemes, the present invention allows sharing of 

hypermedia database system 100. information, provides access control and supports querying 

FIG. 2 shows query interface 200 of PowerBookmarks, 35 and automated bookmark classification based on the con- 

which simplifies interfacing to diverse query interfaces, tents of the underlying documents. In addition, many useful 

layout, terminology, and services offered by different search personalized services, such as automated bookmarking, 

engines. bookmark expiration, and document subscription, can be 

FIG. 3 shows an example of query results returned to ^ provided, 

query interface 200. FIG. 1 shows one embodiment of the present invention in 

FIG. 4 shows in query interface 200 options to allow database system 100. As shown in FIG. 1, database system 

organization of query results. 100 includes a logical database "WebDB" 101, which is 

FIG. 5 shows a specification window in PowerBookmarks built on top of a physical object-oriented database manage - 

for specifying a crawling request. 2 5 meDt svstem ("OODBMS") 102, which can be implemented 

FIG. 6 shows sample results 600 of the crawling request b y . the NEC PERCIO OODBMS. Unlike most search 

of FIG. 5. engines, which focus on information retrieval based on 

FIG * 7 a shows the metadata associated with document kevword s, W ^DB 101 supports database-like comprehen- 

700 in one embodiment of the present invention. sive ***** P rocessin g and allows a ^ to navigate docu- 

r*r,o • 1 * • . , .„ 7n ment structures, contents, and linkage information. Hyper- 

FIG. lb is a table showing the document -specific meta- -* u „ A - A , . 1Aft t . r 4 . , v , 

j . c j . _ rtn & ^ media database 100 utilizes the query, modeling, and 

data of document 700. . ..... ., . , JJ. + M 4 b> ., 

navigation capabilities provided by WebDB 101 to provide 

FIG. Ha shows, based on the document model of the information sharing, access control, and customization ser- 

present invention, PowerBookmarks providing a different vices. 

view of bookmark 800 to different users John, Mary and „ 7 , r. n « M . . . . , ., , . iL ,. . 

p eter 3 35 WebDB 101, which is described in the Copending Appli- 

_' OI , „ cation incorporated by reference above, includes modules 

FIG. ft* shows owner John s view of bookmark 800 of 103 for logical We5 document mode i ing and storage , que ry 

F1G * Sa ' language processor 104, and HTML/VRML document gen- 

FIG. 8c shows owner Peter's and user Mary's view of erator 105. Physical OODBMS 102 includes modules 106 

bookmark 800 of FIG. 8a, 4Q f or internal class representations, an object depository, query 

FIG. 9 illustrates an index structure 900, which is processor 107, and a query result class generator 108. Two 

designed for efficient processing for navigation requests. external components: full text search engine 109 and an 

FIG. 10 shows iconized representations of various docu- on-line lexical dictionary 110 are provided to perform full 

ments used in a user interface of PowerBookmarks. text search and as an on-line dictionary reference for such 

FIG. 11 shows, in display windows 1101 and 1102, two 45 tasks as indexing and query expansion. Full text search 

navigation trees 1110 and 1120, corresponding respectively en g ine 109 and on-line lexical dictionary 110 can be imple- 

to navigation trees for a public bookmark database and a mented by JTOPIC from NEC and Wordnet, known to those 

private bookmark database. skilled in the art. 

FIG. 12a shows a classification for a document containing WebDB 101 can be queried using a query language WQL 

keywords "sports", "car, "import", and "acura" under the 50 ( Web Query Language) for document query and 

LCC scheme. manipulation, which is interpreted by query processor 104. 

FIG. 12b shows a classification for the document of FIG. WQL * modeled after <3 uer y language SQU, known to 

12 under an internet search engine Infoseek. those skll,ed 10 the art * WQL extends *e traditional tables 

FIG. 13 illustrates the classification categories received ° f rel f 0na } J^ h ff and classes of object-oriented data- 

from a classifier using the keywords "Web" and "Database". 55 bases by addihonal data management functions which are 

1y4 , 1* »• t i_ j „ j optimized for document formats and navigation. A statement 

f ,h • 1 S "7 S P T/c ft i mg . d0C T entS i^ in WQL contain two parts: a "SELECT . . . FROM . . . 

ioiaers in tne compu ter/aottware category and page 1402 WHERE" clause for specifying retrieval of data contents 

sXare/Databas?' 6 m C ° mpU,er/ &om hypermedia database 100 and a "CREATE ... AS ... " 

r^T^f r f* f a aSC «™ i^*-^ , , , 60 clause for specifying the output HTML format and naviga- 

FIG. 15 shows pages 1501 and 1502 that display metadata tion of the results In WebDB m HTML documents 

records of two categones in a navigation tree. afe Iogically modeled as object . orieDled hierarchical 

FIG. 16 shows query interface 1600. structures, while physically modeled and stored in the under- 

FIG. 17 shows an example of a subscription definition for lying NEC PERCIO OODBMS as classes. Modules 103 of 

the folder "San Jose Festivals". 65 WebDB 101 are mapped to the classes of modules 106 

FIG. 18 shows a subscription folder 1801 and its enclosed according to a logical/physical schema maintained for query 

documents 1802. translation. A visual query interface (not shown) is 
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supported to assist users in specifying queries. Actual WQL 
queries are then generated automatically by a WQL query 
generator. Hence, the complexities of the underlying schema 
and the query language remain transparent to the user. 

Queries in WQL are translated into their corresponding 5 
internal query tree representations 120 for processing 
against the object-oriented class schema. WQL parser 104 
translates the WQL queries according to the logical or 
physical schema. After query processor 107 completes query 
processing, the results are returned by the physical 10 
OODBMS 102 in internal object-oriented class format. 
HTML/VRML document generator 108 then converts the 
query results from their internal representations to their 
corresponding HTML/VRML forms. 

In database system 100, bookmark management system 15 

121 ("PowerBookmarks") provides application level ser- 
vices and personalized services. PowerBookmarks accesses 
both the Internet and an intranet and allows information 
sharing amongst multiple users. Some of the PowerBook- 
marks services include subscription (124), access control 2 o 
(125), query processing (126), document classification 
(127), personalization (128), navigation (129), information 
sharing (130), and bookmark management (131). Each of 
these services are discussed in further detail below. Power- 
Bookmarks thus serves as an integrated environment for 2 s 
Web information management and access. PowerBook- 
marks interacts with two external components: proxy server 

122 and classifier 123. Proxy server 122 collects a user's 
navigation and browsing history to allow PowerBookmarks 

to automatically adjust for different usage patterns, as well 30 
as to provide for an automated bookmarking service 
explained in further detail below. Classifier 123 classifies 
document classification. 

PowerBookmarks allows bookmarks to be shared and 
accessed by different users. Three types of records are 35 
maintained for a bookmarked URL: "document-specific 
metadata", "owner-specific metadata" and "user-specific 
information". FIG. 7 shows the metadata associated with a 
document in one embodiment of the present invention. As 
shown in FIG. 7a, document 700 is associated with a set of 40 
document-specific metadata which consists of fields "URL", 
"title", "FullText_contents", "Summary", "Keywords", 
"Link_in URL", "Link_out URL", "last modified date", 
"Last refreshed date", "Dead_link" and "Category". The 
definitions of these fields are provided in Table 1 of FIG. lb. 45 
Specifically, in this embodiment, field "FullText_Contents 
is the index identifier returned by JTOPIC full text search 
engine 109 when a document is indexed in JTOPIC. When 
a user issues a query for a full text search, JTOPIC returns 
a set of index identifiers for the documents matching the 50 
query criteria. The fields "FullText_Contents" and "URL" 
form the mapping between the metadata stored in WebDB 
and JTOPIC. 

The "last modified date" field provides the time stamp of 
document's last modification, which can be used as a 55 
measure for the "freshness" of a document. The "refresh 
frequency" field allows a user to set the frequency (e.g., in 
days) at which the information about a specific document 
bookmark is refreshed in the database. A refresh is per- 
formed by invoking an incremental loader at specified time 60 
intervals. A user can set the refresh frequency to "auto" to 
allow PowerBookmarks to automatically adjust the refresh 
frequency based on the values of "last modified date", 
"access frequency", and "last refreshed date". During 
refresh, if the system finds that a given URL has been 65 
moved, the Dead_link field is set to "true" and PowerBook- 
marks allows a user to specify a criterion for automated 



removal of dead links and inactive bookmarks. Inactive 
bookmarks can be identified based on the values of Last_ 
visited_date". 

Although different people can bookmark and access the 
same URL, PowerBookmarks stores only one copy of docu- 
ment and its document specific metadata. More than one 
owner-specific metadata records, and more than one user- 
specific metadata records can be associated with each URL, 
so that personalized service can be provided. An "owner- 
specific metadata" record identifies the user in the "owner_ 
ID" field. The owner of the "owner-specific metadata" 
record can provide his own "local_title" for, and can set 
access control restrictions (e.g., "shared" or "private") on, 
the document associated with the URL. Further, the fields 
"Local_classification" and "Private_Tree Category" the 
document to be classified under the owner's classification 
scheme (discussed in further detail below). (Note that, a 
similar field "category" is provided in the "document- 
specific metadata" record). The "comments" field allows the 
owner to associate personal comments of a bookmarked 
document. 

In this embodiment, user-specific metadata records are 
maintained for the automated bookmarking services dis- 
cussed in further below. Typically, associated with each 
user-specific metadata record is (a) the "user_ID" field, 
specifying the identity of the user; (b) the "access_ 
frequency" field, storing the frequency at which the user 
access the document of the URL; and (c) the "last_visited 
date" field, indicating when the last time was that the user 
accessed the document of the URL. 

As noted above, "category" fields are provided in both 
document-specific metadata records and "owner-specific 
metadata" record. If a document is specified shared, it may 
be accessed in both the public bookmark database and 
private bookmark databases. (As used in this context, these 
databases can be implemented by "virtual databases" — i.e., 
views). However, the document may be classified into 
different categories in the public and private databases. In 
general, the public bookmark database has a larger number 
of URLs, so that the classification in the public bookmark 
database is typically of a finer classification granularity. 

Based on this modeling, PowerBookmarks can provide a 
different view of the same document to different users, as 
illustrated in FIG. 8. As shown in FIG. 8a, bookmark 800 is 
associated with a document-specific metadata record 806, 
two owner-specific metadata records 801 and 802, corre- 
sponding to users John and Peter and three user-specific 
records 803-805, corresponding to users John, Peter and 
Mary. FIG. Sb shows owner John's view of bookmark 800 
of FIG. 8a. Since owner Peter has designated bookmark 800 
as shared, owner John sees both his own view (i.e., owner- 
specific metadata record 801) and owner's Peter view (i.e., 
owner-specific metadata record 802) of the bookmark. Thus, 
owner John has access to the comments of owner Peter. 
Since owner John has provided a local title (i.e., specified a 
tide in the "local title" field of owner-specific metadata 
record 801), PowerBookmarks substitutes owner John's 
local title for the title specified in document-specific meta- 
data record 806. 

FIG. 8c shows owner Peter's and user Mary's views of 
bookmark 800. Since owner John has designated his book- 
mark on document 800 to be private, owner Peter sees only 
his own owner-specific metadata record and the document- 
specific metadata record to be associated with bookmark 
800. Since owner Peter has specified his own bookmark to 
be "shared", user Mary has access to owner John's owner- 
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specific metadata record 802 but not owner John's owner- 
specific metadata record 801. In addition, user Mary is not 
allowed to add comments to bookmark 800. 

One advantage implementing physical OODBMS 102 in 
NEC's PERCIO OODBMS is its flexible modeling capabil- 5 
ity. Specifically, PowerBookmarks takes advantage of 
pointer-based operations, such as pointer traversal or inter- 
section of two sets of pointers, rather than the more expen- 
sive join operations in other relational database management 
systems. FIG. 9 illustrates an index structure 900 of 10 
PowerBookmarks, which is designed for efficient processing 
for navigation requests, using the pointer operations of 
physical OODBMS 102. 

As shown in FIG. 9, index structure 900 has five types of 
navigational nodes: "folders" (e.g., folders 901-902), 15 
"documents" (e.g., documents 903-907), "keywords" (e.g., 
keyword 908), "user" (e.g., user 911) and "owner" (e.g., 
owner 910). Navigational nodes are interconnected by point- 
ers. For example, if a user accesses document 904 (labeled 
"Doc X" in FIG. 9), the user can access, through 20 
PowerBookmarks, all the document-specific metadata 
records. Some possible navigations that the user may per- 
form are: 

1. find documents which have common keywords as in 25 
document 904. In this instance, PowerBookmarks fol- 
lows the "Doc_kwd_pointer" pointer 920 associated 
with doc 904 to navigational node ("keyword") 908 and 
"Kwd_doc__po inter" pointer 922 to reach document 

2. find documents with keywords related to keywords in 
doc 904. In this instance, PowerBookmarks follows 
"Doc_kwd_pointer" pointer 920 of document 904 to 
navigational node 908, and then follows "Kwd_ 
relatedKwd_pointer" 921 to navigational node 35 
("Related keyword") 909 and then through Kwd_ 
doc__pointer to reach document 912. 

3. find all documents which link to or are linked by 
document 904. In this instance, PowerBookmarks cal- 
culates the union of "Linkout_doc" and "Linkin__doc" 4 o 
pointers 923 and 924 of document 904. 

4. find all documents in the same category (i.e. folder) as 
document 904. In this instance, PowerBookmarks fol- 
lows "Doc folder_pointer" pointer 924 to reach 

folder 901, and then follows "Folder_doc_po inter" 45 
pointer 926 to reach document 903. 

PowerBookmarks supports three ways for collecting 
bookmarks (i.e., universal resource locators (URLs) which 
point to web documents). First, bookmarks are collected 
through an interactive search or navigation on the Internet, so 
Second, bookmarks can be collected by a batch search or 
navigation process called "crawling". Third, bookmarks can 
be collected automatically by PowerBookmarks. 

Typically, a user collects the URL of a document of 
particular interest interactively. To enable interactive search, 55 
a search engine usually provides a set of services for query 
of web information. Some of these services are offered only 
in certain search engines. Powerbookmarks offers uniform 
query interface 200 (shown in FIG. 2) which simplifies 
interfacing to diverse query interfaces, layout, terminology, 60 
and services offered by different search engines. 
Consequently, a PowerBookmarks user need not be con- 
cerned with the heterogeneity of search engines. Query 
interface 200 can be customized based on the user's pref- 
erences. Query interface 200 forwards a user's queries to a 65 
corresponding search engine. For example, queries related to 
link-in or temporal relations can only be obtained only from 



,496 Bl 

8 

certain search engines. Queries on classification categories 
are forwarded to a classifier, such as some search engines on 
the Internet, which maintains a classification scheme and a 
larger collection of documents already categorized. 

Upon receiving results to a query from a search engine, 
Powerbookmarks extracts metadata from the query results 
returned. FIG. 3 show an example of query results returned 
to query interface 200 which because of its simplicity, is 
easier to read compared with the results typically returned 
by the Internet search engines. Query interface 200 allows a 
user to customize the result presentation format based on the 
user's preference. 

With query interface 200, a user can select multiple URLs 
for browsing in a "slide show" fashion". The user can also 
press a button to collect an URLs of interest into Power- 
Bookmarks. When a user request is issued, the system 
performs a sequence of tasks as follows: (1) downloading 
the documents pointed to by the collected URLs; (2) parsing 
metadata, such as links, keywords, and summary from the 
collected URLs; (3) indexing the collected URLs into for- 
mats usable by JTOPIC and WebDB 100; and (4) classifying 
the collected URLs into categories. 

PowerBookmarks provides various services to assist users 
organize query results. FIG. 4 shows in query interface 200 
options to allow organization of query results. 

Crawling can be seen as a "batch" mode of collecting Web 
documents, which allows a user to collect a number of 
documents. In PowerBookmarks, crawling is accomplished 
by using Internet search engines. FIG. 5 shows a specifica- 
tion window in query interface 200 for specifying a crawling 
request. 

Crawling is achieved in PowerBookmarks by a number of 
steps. First, PowerBookmarks obtains one or more seed 
URLS. As shown at screen portion 501 of FIG. 5, a user can 
specify a set of criteria which identify the seed URLs. The 
criteria include title, URL, keywords, anchors, and publica- 
tion date (i.e. last modified date). Based on these specified 
criteria, PowerBookmarks generates queries and forwards 
them to one or more web search engines. URLs meeting the 
specified criteria are then returned by the web search 
engines. These URLs are seed URLs for the crawling. 

Second, PowerBookmarks traverses the links of the seed 
URLs. Screen portion 502 of FIG. 5 allows a user to specify 
one or more traversal strategies. Specifically, in 
PowerBookmarks, the crawling strategies include traversing 
a specified number of levels of links pointing to the docu- 
ments of the seed URLS, and traversing a specified number 
of levels of links pointed to by the documents of the seed 
URLS. The crawling can also be restricted traversal of no 
more than a specified number of URLs. To traverse links 
pointed to by documents of the seed URLs, the documents 
of the seed URLs are downloaded and parsed. Internet 
search engines are queried for the documents which points 
to the seed URLs. If the number of levels for link traversal 
is greater than 1, the URLs of documents downloaded in 
each level of links are used as seed URLs for the next level 
of links to be traversed. This procedure is applied until the 
specified number of levels of links is traversed. The user can 
also confine the crawling to within the same domain as the 
seed URLs, or a specified domain. 

At screen portion 503, a user can specify a system for 
storing and indexing the crawling results in the database. 

Since crawling is a time-consuming task, PowerBook- 
marks provides useful feedback information to allow a user 
decide if the crawling task should be carried out as specified. 
The feedback information includes sample URLs, estimated 
number of URLs to be crawled, and an estimated time 
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remaining for completing the crawl. To provide sample the size of the crawling space increases very quickly as the 
URLs, PowerBookmarks provides a subset of the crawling number of level of links to traverse increases, limiting the 
results to the user. As shown in FIG. 6, PowerBookmarks crawling within the same domain can sometimes be prefer- 
provides 10 sample crawling results, based on the specifi- able. Note that the percentage of links in the seed URLs 
cation in screen portion 501 of FIG. 5. The user can then 5 pointing to other domains is much higher than the percent- 
examine the contents of samples to judge if the crawling age of links in the second level URLs pointing to other 
results are of his or her interests. domains. Further, in one experiment, about 12.5 percent of 
Estimated number of URLs to be crawled and time documents could not be downloaded within reasonable time 
required are provided to let the user determine whether or due to server errors, network errors, or the documents have 
not the number of URLs remaining to be crawled is within 10 moved. Empirical data of this kind can be used to increase 
his or her expectation, and if time required to complete the the accuracy of the time estimate. The URLs and meta-data 
crawl is acceptable. Based on the estimation, the user can resulting from the crawling request is stored in the specified 
then refine or relax the crawling specifications. To estimate database (specified by field 503 of FIG. 5) 
the number of URLs to crawl, the following parameters are In addition to collecting bookmarks interactively and by 
defined: 15 crawling, PowerBookmarks provide an automated book- 

1. The list of seed URLs, denoted by S(Q). marking service. To accomplish automated bookmarking, 

2. The number of seed URLs in S(Q), denoted by Count P 10 ** 122 ( FIG - *) tracks user Internet access behav- 
(S(Q)) iors. In addition to its role as a proxy server for web access, 

3. The list of seed URU not including S(Q), derived by , n f™* server 122 ^ des an intelligent history management 
traversing n levels of link from 1(0), denoted by 20 l ° o1 ' keCpmg the folIoWmg * f ™ 0D for each URL; 
S wrw „XQ, n, d), where d is either 0 or 1 representing, L number and dates of visits t0 the 

respectively, where the crawling procedure is to be 2 - ^ URLs referring (i.e. navigating) to this URL; 

carried out in all domains or in the same domain. 3. URLs referred from this URL; and 

4. The list of seed URLs derived by traversing n levels of 2 s dates 0D wmcn sucn navigation occur. 

link pointing into S(Q), denoted by S^^^/Q, n, d), since the pages a user views frequently is likely to be 

where d is either 0 or 1 representing, respectively, revisited frequently in the future, PowerBookmarks auto- 

where the crawling procedure is to be carried out in all matically bookmarks URLs with an access frequency higher 

domains or in the same domain. man a specified value over a specified time period. In 

5. The average number of outward links from S(Q) at 30 addition > PowerBookmarks provides a more sophisticated 
depth n, not including backward links, denoted by automated bookmarking service taking into consideration a 
OutDegree(S(Q), n, d), where d is either 0 or 1 user's navigation behavior and the associations between the 
representing, respectively, where the crawling proce- URUs bem S accessed and existing bookmarks, since visits to 
dure is to be carried out in all domains or in the same related documeDts are often correlated. To identify URLs for 
domain. Note that OutDegree(S(Q), n, d) is the same as 35 bookmarking, PowerBookmarks calculates for each URL a 
OutDeeree (S /Q n-1 d) 1 d) "P a S e rank " usm 6 the access frequency and the link struc- 

, u ' >* ' tures of the document associating with the URL. To exploit 

f nea X e I a P numDer ' onward UnKspointmg into bCU fink structures, a "connectedness" measure is used to quan- 

txom URLs n levels away, not including backward *u • * i * a ^ * j • 

i i j # . . ^ ^ /o/^x j\ i_ j ■ fcfy tne importance of related pages. Connectedness is 

links, denoted by OutDegree(S(Q), n, d), where d is j i j 4 tf ur . r 

. tl _ ' . J r v v " . / . An denned as the number of pages a user can reach from or to 

either 0 or 1 representing, respectively, where the <o a page a predeflned distance expressed in the number 

crawling procedure is to be earned out in aU domains ^ ^ ^ ^ ^ ^ * ^ 

ormttesameto.iiLl^thtf lnDepceCSCQXii.d) and the connectedness. 

is the same as InDegreeCS^^Q, n-1, d), 1, d). To p[ovide automated bookmarkingj PowerBookmarks 

6. Number of levels to crawl following outward links, ^ performs the following steps: 

denoted by L^^,,^. ^ partitioning the collected access history on the proxy 

7. Number of levels to crawl following inward links, server into site clusters, according to host names. 

denoted byL inwarJ; 2. calculating the page rank for each URL. (In one 

« ' ?^ Dl( f^d°'^' d)) ~ n be es / t i mated b l C T' embodiment, only links within a distance of 2 from the 

(S^^Q, n-1, dfl'OutDegreeCS^^Q n-1, d), 1, d). 5Q URL ^ considered). 

Similarly, Count (S^^Q, n, d)) can be estimated by , bookmarkine .he URL when its naee rank exceeds a 

Count(S ^Q, n-1. d)).InDegree(S n-1, d) 1, 3 " ^?Z!^^L ' & 

t nf MP? FvntT ? & V , ™Tv tV 81VeD Taki "g advanta S e ° f bolh access ^uency and the link 

vLT f?' °Ia h £ VtS slructure > this me ' h ° d * "lore likely to bookmark those 

levels of inward links U- , and domain d, the estimated cc u ' • n ... , . . , ™ . 

«,™u D , ~c MD T, ^ ul aaa + a u r* * 55 P a g e & having a high probability of being revisited. This 

number of URLs to be crawled, denoted by Count (Q, L^;., o^^f./^ *u~a u- u -a i 

j y a\ - > u> method is superior to a method which considers only access 

i-ounvam, Un^ard, <*) is given by: frequency of each URL or which evaluate URL indepen- 

count (a l^l^ d^co** (^Count (s inw UQ, dcnily pages which have few associations with the 

i ln ^, *0)+Count {S^JQ, L^^, d)) accessed URLs. When URLs are independently evaluated, 

60 an index page and the content pages referred to by the index 

The estimated time to complete a crawling task can then page are equally likely to be bookmarked. 

calculated by multiplying Count (Q, L wnvan/ , L^^, d) by PowerBookmarks allows users to share bookmarks, 

the average time for processing a document, Since the Shared bookmarks in PowerBookmarks can be viewed as a 

average time to download a document is much greater than public and virtual collection of bookmarks for all users, 

the average time to extract links and to ascertain a backward 65 There are five different types of documents in 

link, the average time for processing a document is close to PowerBookmarks, classified according to ownership, access 

the average time required for downloading a document. As control specification, and other attributes. FIG. 10 shows 
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iconized representations of various documents used in a user 
interface of PowerBookmarks. 

In FIG. 10, icons 1001, 1002, and 1003 represent, 
respectively, an owner's private bookmarks, an owner's 
shared bookmarks, and other people's shared bookmarks. 
Icons 1004 represents subscribed documents. feowerBook- 
rnarks-aUows~a-user"tb'^spe^ify~certain query criteria_fbr 
subscribing new or update^jo^meats.inJheJ^tejiaet^or 
^intFarietrSubscnbed~dbcuments have no owner-specific or 
user-specific metadata records, and are classified into sub- 
scription folders until they are deleted or bookmarked into 
PowerBookmarks. 

Icon 1005 represents deadlink documents. In the course of 
performing automated document refresh, PowerBookmarks 
occasionally finds documents moved. In this embodiment, 
PowerBookmarks marks the moved documents as 
"deadlinks" and so indicate each such document by icon 
1005. 

In PowerBookmarks, a folder is defined as a container for 
a set of documents, a set of sub-folders, or a combination of 
documents and sub-folders. Four types of folders are defined 
in PowerBookmarks: 

1. "Hot List Folder" — a collection of the most frequently 
accessed bookmarks for each user. The URLs in a hot 
list folder are automatically maintained by PowerBook- 
marks to allow the user fast access (i.e., "shortcut") to 
his or her most frequently used bookmarked URLs. A 
hot list folder is represented by icon 1006. 
"Deleted Bookmark Folder"— a folder for deleted 
1 ^bookmarks. A user can set a preference for automated 
I removal of "dead links" or "inactive" bookmarks, 
\ whose access frequency is lower than a preset threshold 
\ value. A Deleted Bookmark Folder is represented by 
Vicon 1009. 

3. "Subscription Folder" — a subscription folder is func- 
tionally the same as a regular folder, except that when 
a new document is introduced into a subscription folder 
since the user's last visit, icon 1008 (rather than icon 
1007) is used. 

4. "Bookmark Folder"— a bookmark folder includes 
bookmarks to PowerBookmark's automated book- 
marking services. As discussed above, automated docu- 
ment bookmarking services can be provided according 
to content -based classifications, which are discussed in 
more detail below. 

In PowerBookmarks, documents are classified under a 
hierarchical classification structure ("classification tree" or 
"navigation tree"), such as shown in FIG. 9. FIG. 11 shows, 
in display windows 1101 and 1102, two navigation trees 
1110 and 1120, corresponding to navigation trees for a 
public bookmark database and a private bookmark database. 
FIG. U displays for each node both the number of book- 
marks in the folder and the access frequency of each folder. 
In addition, a temperature icon is shown alongside each node 
in navigation tree 1110, to graphically indicate the access 
frequency of each node. As shown in FIG. 11, the URLs of 
most interests are "computers job fairs", "computer game 
companies" and "database conferences". 

In each navigation tree, each node is represented as a 
bookmark folder. PowerBookmarks provides automated 
document classification using external classifier (e.g., the 
Pharos system, which is based on the Library of Congress 
Classification (LCC)). FIGS. 12a and 12b show, 
respectively, a classification for a document containing 
keywords "sports", "car", "import", and "acura" under LCC 
and classification for the same document under an internet 
search engine. In FIG. 12a, each LCC ID represents a node 
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in the LCC hierarchical structure. The label of a node is a tag 
along the path from the top-level root node to the node into 
which the document is classified. However, categories pro- 
vided by many classifiers are too fine, e.g., 6 to 7 levels. 
While classifying to such fine categories provide an accurate 
classification of the subject matter, such classification hier- 
archy is not convenient for a user to navigate because, to 
reach a document, many steps have to be taken to traverse 
the classification tree. In fact, many usability studies have 
pointed out that a deep hierarchy results in inefficient 
information retrieval because of the numerous traversal 
steps required and the tendencies of users to make mistakes 
along the way. 

Using the observations that (a) a typical user's bookmark 
collection contains less than a thousand URLS, and (b) a 
large collection of a shared bookmarks may have up to a few 
thousands of URLs, PowerBookmarks provides navigation 
trees which are adjusted according to the numbers of docu- 
ments in collections, and user preferences (e.g. breadth of 
the navigation tree), and user behavior (e.g. document access 
frequency). Accordingly, PowerBookmarks provides navi- 
gation trees typically of a depth of 3 or 4 levels, so as to . 
ensure high usability. PowerBookmarks constructs a navi- 
gation tree dynamically for efficient navigation, so that the 
number of traversal steps is minimized, but without com- 
promising accuracy of classification. 

The Navigation Trees Patent Application (incorporated by 
reference above) provides a procedure that constructs and 
dynamically maintains a navigation tree according to preset 
breadth. The procedure creates and deletes sub-nodes to an 
existing node when required (i.e., when certain preset con- 
ditions are satisfied), when new documents are created and 
inserted. 

FIG. 13 illustrates the classification categories received 
from a classifier using the keywords "Web" and "Database". 
Under the procedure described in the Navigation Trees 
Patent Application, PowerBookmarks may place the docu- 
ment into categories "Computers and Internet: Internet", 
"Computers and Internet: Software" and "Regional: Coun- 
tries" categories respectively instead of the seven categories 
returned by the classifier, if each of these categories include 
a number of bookmarks less than a predetermined value. 

In addition, to take into consider the users* access 
patterns, when splitting a node, PowerBookmarks keeps 
frequently accessed documents in the node, while pushing 
less frequently accessed documents to the lower new level. 

Deleting a document is a reverse operation of insertion 
described above. 

PowerBookmarks provides both efficient navigation and 
complex query processing. Some of the relations among 
documents, folders, keywords, users, and owners are illus- 
trated above in FIG. 9. Under such organization, Power- 
Bookmarks provides fast response time for navigation with- 
out relatively expensive query processing. As discussed 
above, FIG. 11 shows navigation trees 1110 and 1120. A user 
can select a category to access bookmarks in that category 
(folder). FIG. 14 shows page 1401 listing the documents and 
folders in the "Computer/Software" category and page 1402 
listing the documents and folders in the "Computer/ 
Software/Database". Page 1401 lists not only the user's own 
bookmarks (shared and private), but also shared bookmarks 
owned by other users. In addition, dead links detected by the 
system are also reported using the appropriate icons. 

As discussed, when enabled by the user, PowerBook- 
marks automatically moves dead links and inactive book- 
marks to the "Deleted folder" shown at the bottom of the 
page 1401. Navigation to a sub-node or subcategory is 



03/16/2004, EAST Version: 1.4.1 



US 6,6i 

13 

achieved by selecting the corresponding folder for the 
selected category. For example, when a user selects the 
"Database" anchor of page 1401, page 1402 is brought up to 
list the documents and folders of the category "Computer/ 
Software/Database", which is in the next level of navigation 
tree. Note that pages 1401 and 1402 list different sets of 
metadata records, according to difference sets of user pref- 
erences selected. Also, pages 1401 and 1402 shows that the 
"Hot List" and the "Deleted Bookmarks" folders are dis- 
played at both pages, since these icons are "shortcuts" to the 
Hot List and the Deleted Bookmarks. 

The order in which the documents within a page (e.g., 
pages 1401 and 1402) are listed is determined by the sorting 
criteria (e.g., by "last modified date") specified at the top- 
right of each of pages 1401 and 1402. The user can also 
select a bookmark from a page to view the actual HTML 
web document or its metadata records stored on WebDB. 
FIG. 15 shows pages 1501 and 1502 which display metadata 
records of two categories in a navigation tree. As shown on 
page 1501, the detail metadata information for a bookmark 
includes a summary and its most significant keywords. 
Further, at the bottom of page 1501, comments are provided 
by the owners of the bookmark, who specify page 1501 as 
a shared bookmark. The user can select from the keyword 
anchor, for example, the "Database" keyword, to navigate to 
another bookmark pages with such a keyword (e.g., page 
1502). Page 1502 includes all documents with the keyword 
"Database" and their respective classification categories. At 
the bottom of page 1502, PowerBookmarks provides links to 
the documents with related keywords. The semantically 
similar keywords are generated by consulting an on-line 
lexical dictionary, such as WordNet. The syntactically 
related keywords are produced based on a word 
co-occurrence relationship analysis. 

In PowerBookmarks another way to search the book- 
marks of interests is through query. FIG. 16 shows query 
interface 1600. For example, as shown in FIG. 16, a user 
issues a query to retrieve bookmarks related to call for 
papers for conferences related to "XML". Queries with more 
complex criteria, such as links, full text search, related 
keyword search are also supported. Query processing in 
PowerBookmarks is carried out by the underlying web 
database WebDB. After the user clicks on the search button, 
query interface 1600 automatically generates the corre- 
sponding WQL query for the underlying query processing 
engine, WebDB. In this example, the corresponding WQL 
query generated for the specification of FIG. 16 is: 

SELECT Document Dl 

FROM SUser 

WHEREJD1.URL LIKE "www.acm.org/*" 
AND Dl. Keywords mentions "conference", "XML", 
"CFP" 

AND Dl.Access_frequency>5.00 
AND}Dl.Last_modified_date>"Jun. 1, 1998" 
(SUser is a variable identifying the current user) 
The default attributes returned by PowerBookmarks 
include URL, title, and ranking. Other organization services 
for users to browse through query results are also provided. 

In contrast to the "pull" mode where users actively seek 
information using queries or navigation, PowerBookmarks' 
subscription or notification service operates in the "push" 
mode — a user is notified when a specified document is 
modified or introduced on the Internet or an intranet. In 
PowerBookmarks, a user can set the subscription criteria, 
such as "temporal", "domains", "keyword similarity", or 
"document similarity". FIG. 17 shows an example of a 
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subscription definition for the folder "San Jose Festivals". 
As shown in FIG. 17, the user specifies a subscription query 
on the Internet. In particular, the user is interested in 
documents related to the specified keywords that were 

5 created or modified within the last two weeks. Alternatively, 
the user can also provide a sample document to subscribe to 
documents related to the sample document. Upon receiving 
the sample document, PowerBookmarks extracts significant 
keywords from the sample document and used the extracted 

10 significant keywords to create a subscription definition. 
To support subscription or notification at the Internet 
search engine level, PowerBookmarks uses a search engine 
application program interface (API) that allows incremental 
searches to be requested. Currently, one example of a search 

15 engine that includes such an API is HotBot. In particular, 
HotBot allows a user to query new documents that are 
indexed during a two-week window. For a subscription of a 
new or updated related document on an intranet,[Powe?^ 
^okmajr^'can~notify-a~subscribing"TTser of~thT~new~t)r 

20 {Upda tellHocument immedi ately_upon_mtroduct ion into the^ 
intranet^Thus, for document subscriptions on an intranet, the 
period of monitoring need not be specified. When a sub- 
scribed document corresponding to a bookmark becomes 
available, the user is notified (e.g. by replacing folder icon 

25 1007 of FIG. 10 by the sparked folder icon 1008). FIG. 18 
shows a display window which list a number of subscription 
folders, including subscription folder 1801, and a second 
display window showing documents 1802 included in sub- 
scription folder 1801. 

30 ^PoweTBookmarks allows personalizationTFIGrl9'shows 
^-preference setup window 1900 for a user to define personal} 
[preferences. Prefer ence setup window 1900 allows twp^ 
.types of preferences^display prefere nc e and bookmark 
f prefere nce — to be specified . 

35 Display preference parameters allow a user to customize 
the metadata records shown in the query results or the 
navigation pages. For example, PowerBookmarks shows 
different sets of metadata in pages 1401 and 1402 of FIG. 14 
described above. When a display preference parameter is 

40 specified, the query interface automatically augments the list 
of fields to project in the SELECT clause to be submitted to 
WebDB. For example, for displaying page 1401, the 
SELECT clause is "SELECT Doc.title, Doc.access_ 
frequency, Doc.last_refreshed_date", while for page 1402, 

45 the SELECT clause is "SELECT Doc.title, Doc.URL". 

The bookmark preferences include (a) the maximum 
depth and fanout parameters of the navigation trees, (b) 
ranking preferences and (c) user pattern consideration peri- 
ods, 

50 With respect to the preferred depth and fanout parameters 
for the navigation tree, the degree of fanout is set to 20 by 
default to allow all the folders and documents fit in the 
screen without scrolling. Note that as discussed above, 
PowerBookmarks can merge multiple "branches" of a navi- 

55 gation tree (i.e., categories) to reduce the depth of the 
navigation tree as long as the constraint for maximum 
degree of fanout is satisfied. By merging categories, the 
number of navigation steps necessary to a given category is 
minimized. 

60 With respect to Ranking preferences, PowerBookmarks 
supports, in addition to the sorting schemes based on docu- 
ment attributes (e.g., titles, URLs), three types of ranking 
schemes based on metadata. Specifically, the metadata con- 
sidered includes referral, access frequency, and popularity. 

65 The degree of "referral" is defined as the total number of 
inward links to a document. The values of a "referral" can 
be viewed as a measure of importance of such pages serving 
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as index pages for navigation (i.e. "landmark nodes"). The 6. A bookmark system as in claim 2, wherein each of said 

number of referral links is derived during the indexing categories is a node of a navigation tree, 

phase. 7. A bookmark system as in claim 6, wherein, in said 
Access frequency is definedas the number of-accesses. for - navigation tree, each category includes less than a predeter- 

ja page over a specified period of time. "Popularity" is 5 mined number of documents. 

defined as the percentage of users accessing a page over a^ s 8. A bookmark system as in claim 7, wherein said navi- 
specified period of time. Access frequency and popularity^ gation tree is grown by providing child nodes to an existing 
provide different indications for the nature of the document^ node when said predetermined number of documents is 

Forfexample, a document with a high "popularity" value but ? exceeded in the category corresponding to said existing 
a low access frequency value implies that the documents lb node. 

could be a bulletin type of announcement, but cannot be ; 9. A bookmark system as in claim 1 wherein said interface 

usecLas-an operational-reference. -— — — - couples to a proxy server coupled to said computer network. 

With respect to user pattern considered period, if a user 10. A bookmark system as in claim 9, wherein said proxy 

specifies the system to only consider his or her usage pattern server monitors, for each user, an access frequency for said 

in the past "pattern considered period" (say, 14 days), 15 document. 

PowerBookmarks ignores the user's access pattern 14 days 11. A bookmark system as in claim 10, wherein said 

ago. Based on the value of the "pattern considered period", bookmark management system automatically associates 

PowerBookmarks computes temporal decay factor value a identification information of a user with a document record 

which is between 0 and 1. when said access frequency of a user exceeds a predeter- 

If the "pattern considered period" is specified, access 20 mined number, 

frequency for day N is calculated as follows: 12. A bookmark system as in claim 10, wherein said 

bookmark management system calculates for said document 

AC Acc7ss reqU e n ^- Score( ^" Ac( : es8 - Freqiienc > r(JV)+a * a page rank, said page rank being a function of said access 

ccess_ requency_ core( - ) frequency and a quantity related to documents referenced by 

l 25 said document or referencing said document, 

where a = o.oi^""* 13, A bookmark system as in claim 12, wherein said 

function is a product. 

~ . f - 14. A bookmark system as in claim 10, wherein said 

Tim formula adjusts the weights for the access patterns bookmark management system associates with said docu- 

and the weights for access patterns prior to the pattern 30 ment record an access pattera of said doC ument. 

considered period are reduced to values less than 0.01. A 15 A ^0^^ system as m claim h wherein said 

value of 1 for a makes the system treats all access patterns 0W ner-specific record indicates whether information on said 

equally, while a value of 0 makes the system consider only owner-specific record is shared. 

the access patterns of yesterday. 16. Abookmark system as in claim 15, further comprising 

The above detailed description is provided to illustrate the 35 a ^ interface through which a user accesses said book . 

specific embodiments of the present invention and is not mark management systcm> said bookmark management sys- 

mtended to be limiting. Numerous variations and modifica- tem presents t0 said ^ over said ^ mtefface a custom . 

tions within the scope of the present invention are possible. ^ view of said docume at according to information in said 

The present invention is set forth in the appended claims. owner-specific record and in said document record. 

We claim: 40 17. A bookmark system as in claim 15, further comprising 

1. A bookmark system having access to a computer a g^p^! user interface, said graphical user interface 
network, comprising: displaying for each document record shared information of 

an interface to said computer network; an owner-specific record associated with said document 

a database management system; record. 

a bookmark management system coupled to said database 45 18. A bookmark system as in claim 17, wherein said 

and said interface, said bookmark management system shared information comprises annotation, 

creating and maintaining in said database a document 19. A bookmark system as in claim 1, wherein said 

record containing information for locating a document bookmark management system associates with each docu- 

in said computer network, and for retrieving said docu- ment only one document record. 

ment from said computer network over said interface 50 20. A bookmark system as in claim 1, wherein said 

using said information for locating a document wherein bookmark management system collects documents by 

said bookmark management system associates, for each crawling. 

owner, said document record with an owner-specific 21. A bookmark system as in claim 20, wherein said 

record and wherein said bookmark management system crawling is semnantic-based. 

associates, for each user, said document record with a 55 22. A bookmark system as in claim 20, wherein said 

user-specific record. crawling is domain independent. 

2. Abookmark system as in claim 1, further comprising a 23. A bookmark system as in claim 20, said crawling 
document classification system for associating said docu- being limited by the number of levels of links followed from 
ment into one or more categories. said document. 

3. A bookmark system as in claim 2, wherein said docu- 60 24. A bookmark system as in claim 20, wherein said 
ment classification system accesses an a classifier program crawling calculates an estimated time based on said access 
on said computer network through said interface. pattera. 

4. A bookmark system as in claim 2, wherein said cat- 25. A bookmark system as in claim 20, wherein said 
egories are leaf nodes of a hierarchical classification trees. crawling provides sample documents prior to completion of 

5. A bookmark system as in claim 2, wherein said data- 65 said crawling. 

base system accesses a lexical dictionary for retrieving a list 26. A bookmark system as in claim 20, wherein said 

of keywords relating to a document. crawling is conducted within a specified domain. 
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27. A bookmark system as in claim 20, said crawling 37. Abookmark system as in claim 32, wherein said folder 
being limited by the number of levels of links pointing to record references documents to be accessed on a regular 
said document. basis. 

28. Abookmark system as in claim 20, further comprising 38* Abookmark system as in claim 32, wherein said folder 
cr^lin Dte 10 a ^ t0 Parametere ° f Said 5 record is associated with documents to be accessed when 
Cr 29 m A* bookmark system as in claim 28, wherein said ^ oduced or updated, introducing or updating of said 
parameters include number of links to traverse from a seed documents being ascertained by performing incremental 
document. search. 

30. A bookmark system as in claim 28, wherein said 10 39. A bookmark system as in claim 38, wherein said 
parameters include number of levels of links to traverse. bookmark management system informs a user when a docu- 

31. A bookmark system as in claim 28, wherein said ment record referenced in said folder is updated, 
bookmark management system accessing and displaying a 40. A bookmark system as in claim 1, said bookmark 
selected number of documents in said crawling prior to management system provides a page rank based on an 
completion of said crawling. is evaluation based on one or more of the following quantities: 

32. A bookmark system as in claim 1, wherein said access frequency> popularity and number of referrals, 
database management system includes a folder that relates 41 A system as m claim x mrther a 
said ^document and 1 other folders by pointers. user configurable graphical user interface. 

33. A bookmark system as in claim 32 wherein said A ~ A , . . \ , . ^ , 

bookmark management system allows traversal of document 20 Abookmark system as m claim 41, wherein said user 

records and said folders by pointers. configurable graphical user interface customizing query to 

34. A bookmark system as in claim 32 wherein said said database mana S ement svstem according to a configu- 
bookmark management system maintains an access pattern ratl0n of said configurable graphical user interface. 

for said document record, said bookmark management sys- 43 " A bookmark s y stem as in claim 1, further comprising 

tem associating said document record with said folder when 25 a graphical user interface, said graphical user interface 

said access pattern matches predetermined criteria. displaying for each document record information of an 

35. A bookmark system as in claim 32, wherein a docu- owner-specific record associated with said document record, 
ment record associated with said folder is marked for 44- A bookmark system as in claim 43, wherein said 
deletion. owner-specific record comprises a local title for said docu- 

36. Abookmark system as in claim 32, wherein said folder 30 ment record, 
references documents records having access frequencies 

exceeding a predetermined value. ***** 
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